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ABSTRACT 

GELBANK is a publicly available database of two- 
dimensional gel electrophoresis (2DE) gel patterns 
of proteomes from organisms with known genome 
information (available at http://gelbank.anl.gov 
and ftp://bioinformatics.anl.gov/gelbank/). Currently 
it includes 131 completed, mostly microbial prote- 
omes available from the National Center for 
Biotechnology Information. A web interface allows 
the upload of 2D gel patterns and their annotation 
for registered users. The images are organized by 
species, tissue type, separation method, sample 
type and staining method. The database can be 
queried based on protein or 2DE-pattern attributes. 
A web interface allows registered users to assign 
molecular weight and pH gradient profiles to their 
own 2D gel patterns as well as to link protein identi- 
fications to a given spot on the pattern. The website 
presents all of the submitted 2D gel patterns where 
the end-user can dynamically display the images or 
parts of images along with molecular weight, pH 
profile information and linked protein identification. 
A collection of images can be selected for the cre- 
ation of animations from which the user can select 
sub-regions of interest and unlimited 2D gel 
patterns for visualization. The website currently 
presents 233 identifications for 81 gel patterns for 
Homo sapiens, Methanococcus jannaschil, Pyro- 
coccus furiosus, Shewanella oneidensis, Escherichia 
colt and Deinococcus radfodurans. 



INTRODUCTION 

The theoretical proteome of an organism is the collection of 
potential open reading frames (ORFs). At a given time and in a 
given state of a cell not all the potential ORFs are expressed. 
One area of proteomics is the characterization of protein 
expression profiles under various conditions. Proteome analy- 
sis requires the isolation of the complete proteome, separation 
of complex protein mixtures into discrete protein components, 
measurement of the relative abundance and identification of 



each protein component. The most widely used separation 
method for proteome analysis is currently two-dimensional 
gel electrophoresis (2DE) (1), where the proteins are separated 
according to their isoelectric point (pi) in the first dimension 
and t&sk poj^cular weight (MW) in the second dimension. 
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proteins, these 2DE gel patterns provide information of 
relative abundance, MW and pi of isolated proteins in the 
given pi and MW range. The stained protein spots can be 
identified by peptide mass fingerprinting by comparing the 
peptide masses of digested proteins cut out of 2DE gels with 
the predicted peptide masses from the corresponding genome 
ORF database. 

In the process of analysis of a 2DE gel pattern certain 
questions arise. Given the complete genome and potential 
ORF information, what is the most likely identity of a protein 
at a given location on a gel? What are the predicted proteins 
resolved in a given pi and MW range? We have calculated 
theoretical MW and pi values for all the ORFs for 131 
organisms with completed genome information. A search 
engine was built to allow the user to sort proteins based upon 
their physical characteristics and their annotation data. 
Theoretical 2DE patterns are also presented with the ability 
to change both the pi and MW ranges dynamically. 

Our laboratory has extensive expertise in 2DE separation of 
proteins. We have studied protein expression in a variety of 
organisms including human, rodent and microbial samples. 
GELBANK not only displays 2DE gel patterns generated in 
our lab, but it also allows the display and annotation of 2DE 
gel patterns of registered users with the intent to centralize 
2DE gel patterns from around the world. This site is similar 
to the well-known SWISS- 2DP AGE which contains more 
than 30 reference maps with more than 1000 identifications 
(4). SWISS-2DPAGE has reference maps for mostly eukary- 
otes (human, mouse, Arabidopsis thaliana, Dictyostelium 
discoideum and Saccharomyces cerevisiae), whereas 
GELBANK has reference maps for mostly prokaryotes 
{Methanococcus jannaschii, Pyrococcus furiosus, 
Shewanella oneidensis, Escherichia coli and Deinococcus 
radiodurans). The inclusion of Swiss-Prot identification on the 
ORF display (see below) provides a gateway to the powerful 
ExPASy server (5). 
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Figure 1. 2DE gel pattern display. The web interface displays a dynamically sizeable image. If profile information is provided, the MW and/or pi values are 
also displayed dynamically. Spot annotation is displayed by moving the cursor over the spot. Details of the identification (identification method, MW and pi 
deviaUon from theoreUcal values) are available on the same display. 



DESCRIPTION 

The GELBANK website features six major sections: 

(i) Proteomes: a collection of interfaces for proteome queries; 

(ii) 2D Gels: interfaces for 2DE gel pattern queries; (iii) Tools: 
a collection of tools pertinent to proteome analysis by 2DE; 
(iv) Upload: web interfaces for registered users for manipu- 
lation of their 2DE patterns; (v) Bio-bag: an interface similar 
to a shopping basket for easy storage of various items from the 
website; and (vi) FTP site: for the access of data in various 
formats. The website was tested with Microsoft's Internet 
Explorer and some features (2DE annotation) work only with 
this browser. 

The sections presented above are described in greater detail 
next. 

Proteomes 

The potential proteomes of 131 organisms with completed 
genomes are stored in an ORACLE9i database (currently 
464 770 records). The theoretical MW, pi and potential trans- 
membrane regions of ey^ QRF was .c&feurid&d using the 
SOSUI sstver (6). The pmts&ms* can b<s «««^ched MW and 
pi r^|«v ORE annotation, % sequence fragments or 
patterns selecting a single organism or a collection of 
organisms. A single protein or a set of proteins can be 
selected from the result set and used as input for analysis tools 



such as multiple alignment (ClustalW) (7), charge distribution 
at a given pH (see below) or storage for later use in Bio-bag 
(see below). A summary display for individual entries is also 
available including links to other publicly available databases 
(supplementary fig. SI). This display presents a link to the 
collection of 2DE gel patterns if available for the given 
organism. If the given ORF is associated with a spot on a given 
2DE gel pattern by the annotator, a link is presented to the 
given pattern(s). 

2DE gels 

2DE images are uploaded, profiled and annotated by regis- 
tered users. These high-resolution 2DE images are displayed 
in a dynamic fashion. 2DE protein patterns are dynamically 
sizeable from a width of 300 up to 1500 pixels, with image 
sizes ranging from 30 to 800 kb, respectively. This feature 
accommodates users with either a slow modem connection or 
a high-speed connection. The user is able to zoom into any 
region of the 2DE gel pattern with the corresponding MW and 
pi scale automatically updated. The gel database can be 
searched by organism, sample type and annotation. Since 
gels are profiled in both the MW and pi dimensions the 
approximation of these properties for a given spot is available 
(Fig. 1). Spots can be annotated by assigning a single ORE or 
multiple OREs from the database. Once a gel has associated 
annotation, this information is also displayed on the website. If 
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PROT [ 01J 

CI: 18892121 
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Figure 2. Example of a database entry. All sequences, 2DE gel patterns and their annotation data can be downloaded from the FTP site (ftpV/bioinformatics 
anl.gov/gelbank/). An example of an annotated 2DE gel pattern is shown. 



the gel has been profiled in both dimensions and an annotation 
for a given spot is available, the deviation of observed pi and 
MW of a given spot can be determined by comparing them 
with the theoretical values. The deviation is displayed in a 
dynamic fashion along with the identification method used. A 
link is provided for the detailed ORF information. 

GELBANK allows the user to assign multiple identifica- 
tions for a given spot on a 2DE gel pattern and displays all the 
associated data. The majority of spots analyzed in our 
laboratory provide multiple unique hits using tandem mass 
spectrometry. This might be due to the fact that some of the 
species are extremophiles and the solubilization method used 
might not dissociate complexes completely. In other cases the 
pH gradient used might be too broad for successful separation 
of proteins with similar physicochemical properties. 

The dynamic display of 131 in silico patterns is also 
presented on the website. 

Tools 

This part of the website has a small collection of proteomics 
tools relevant to 2DE. In the process of searching the database, 
a collection of proteins can be selected. These can be saved as 
a list of sequences (Bio-bag) and can be analyzed later by tools 
implemented on the website. For example, using ClustalW 
allows the alignment of these sequences. In certain applica- 
tions it is important to know the predicted pi and MW of a 
protein or part of the protein (degradation product). The 
titration curve and MW of a given complete or partial ORF can 
be displayed in a dynamic display. A utility allows the 
simultaneous display of titration curves of an unlimited set of 
selected ORFs, which is helpful in cases where the optimum 
pH is searched for the highest difference in charge between 
two or more proteins (e.g. in ion-exchange chromatography). 



Searching GELBANK can also return a list of images. 
These can also be saved and analyzed by a gel animation tool. 
Gel animation allows the selection of multiple IDE patterns, 
creating a virtual movie. The display of the image collection 
can be controlled in several ways: (i) change of display time 
for each frame, (ii) direction of the movie, etc. A web interface 
allows the user to select a sub-region from each image and 
play the modified animation. Similar to 2DE patterns, 
animations can also be saved to Bio-bag (see below). 

Some of the tools used were developed at the ProteomeWeb 
website (8) (http://ProteomeWeb.anl.gov) and shared between 
the two websites. 

Upload 

As mentioned above, GELBANK is accepting the upload of 
2DE gel patterns from registered users. Users can upload their 
images using a browser interface. GELBANK currently 
accepts images only in 8-bit PNG format. The current 
limitation for file size is 1.4 Mb, which corresponds to a 
fairly high-resolution image (~ 1800 X 1800 pixels). An online 
tool allows the profiling of the uploaded image by selecting 
markers on the gel and assigning MW and pi values to those 
points in the image. Several curves are then fitted to the data 
(first to fifth order polynomials) with the corresponding 
correlation coefficients and limit values for the given 
polynomial. The user can select the fit that best describes a 
given gel, and these data are stored in the ORACLE9i 
database. A critical feature of any proteome study is the 
identification of the proteins detected; therefore if annotation 
data are available, an online tool allows the assignment of an 
ORF or a collection of ORFs to a given protein spot, thereby 
linking the 2DE gel pattern to the proteome database. The 
interface allows the assignment of multiple identifications for 
a given spot. Assignments can be edited using the same 
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interface. Registered users can add new properties (e.g. 
identification method for spot ORF assignment, method for 
separation in the first dimension, etc.) if not listed already, 
giving great flexibility while maintaining database integrity as 
all attributes are entered from a list of possibilities. All data are 
stored in the database for quick retrieval. 

Bio-bag 

Query interfaces presented on the website allow the storage of 
objects for later use. These objects currently are: protein 
sequences, 2DE gel patterns and animations (using intact 
patterns or sub-regions of patterns). When a registered user is 
logged into the system these objects are saved in the database 
and can be recalled later (supplementary fig. S2), otherwise 
they are deleted after 30 min of inactivity. A few tools are 
available by selecting a number of these objects: (i) multiple 
sequence alignment, titration curves for ORFS, (ii) animation 
for selected 2DE patterns, etc. Contents of the Bio-bag can be 
deleted if necessary. 

FTP site 

The entire content of the website is available for download 
from our FTP site (ftp://bioinformatics.anl.gov/gelbank/). The 
data are presented in three different formats: (i) simple tab- 
delimited file (for easy parsing of the information), (ii) flat text 
format that is easily read by humans as well as by computers, 
and (iii) ORACLE9i database dumps (Fig. 2). The 2DE 
patterns are also available for download. The FTP site is 
updated on weekly basis. 

CONCLUSION 

A 2DE gel pattern database is presented that ties directly to a 
proteomics database of species with completed genome 
information. The public database allows the submission of 
gel patterns by registered users and their annotation using a 
web interface. The proteomes can be queried and the resulting 
data set can be used as an input for some proteomics tools that 
are pertinent to 2D gel electrophoresis. A new 'shopping 
basket' is introduced (Bio-bag) that allows the storage of 
objects presented on the website. Gelbank's content can be 



downloaded in various forms from the public FTP site. Future 
releases will include more proteomics tools as well more 
detailed gel pattern annotations. In addition, once available, 
Open Source 2DE pattern analysis software will be imple- 
mented on the website and users will be able to get analysis 
reports on submitted gel patterns. 
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